A Word-level Morphosyntactic Analyzer for Basque
نویسندگان
چکیده
This work presents the development and implementation of a full morphological analyzer for Basque, an agglutinative language. Several problems (phrase structure inside word-forms, noun ellipsis, multiplicity of values for the same feature and the use of complex linguistic representations) have forced us to go beyond the morphological segmentation of words, and to include an extra module that performs a full morphosyntactic parsing of each word-form. A unification-based word-level grammar has been defined for that purpose. The system has been integrated into a general environment for the automatic processing of corpora, using TEI-conformant SGML feature structures.
منابع مشابه
Machine Learning of Morphosyntactic Structure: Lemmatizing Unknown Slovene Words
Automatic lemmatization is a core application for many language processing tasks. In inflectionally rich languages, such as Slovene, assigning the correct lemma (base form) to each word in a running text is not trivial, since for instance, nouns inflect for number and case, with a complex configuration of endings and stem modifications. The problem is especially difficult for unknown words, sin...
متن کاملExploring Treebank Transformations in Dependency Parsing
This paper presents a set of experiments performed on parsing the Basque Dependency Treebank. We have concentrated on treebank transformations, maintaining the same basic parsing algorithm across the experiments. The experiments can be classified in two groups: 1) feature optimization, which is important mainly due to the fact that Basque is an agglutinative language, with a rich set of morphos...
متن کاملA word-grammar based morphological analyzer for agglutinative languages
Agglutinative languages presenl rich morphology and for sonic applications they lleed deep analysis at word level. Tile work here presenled proposes a model for designing a full nlorphological analyzer. The model integrates lhe two-level fornlalisnl alld a ullificalion-I)asod fornialisni. In contrast to other works, we propose to separate the treatment of sequential and non-sequetTtial mou)hola...
متن کاملAn Event Related Field Study of Rapid Grammatical Plasticity in Adult Second-Language Learners
The present study used magnetoencephalography (MEG) to investigate how Spanish adult learners of Basque respond to morphosyntactic violations after a short period of training on a small fragment of Basque grammar. Participants (n = 17) were exposed to violation and control phrases in three phases (pretest, training, generalization-test). In each phase participants listened to short Basque phras...
متن کاملDifferent Issues in the Design of a Lemmatizer/Tagger for Basque
This paper presents relevant issues that have been considered in the design of a general purpose lemmatizer/tagger for Basque (EUSLEM). The lemmatizer/tagger is conceived as a basic tool necessary for other linguistic applications. It uses the lexical data base and the morphological analyzer previously developed and implemented. Due to the characteristics of the language, the tagset here propos...
متن کامل